---
title: "Quick check of IBDMDB metagenomics data subset"
author: David Barnett
date: last-modified
format: html
keep-md: false
theme:
dark: darkly
light: flatly
embed-resources: true
toc: true
toc-location: left
toc-depth: 4
fig-dpi: 200
fig-width: 7.5
fig-height: 5
fig-responsive: true
code-tools: true
code-fold: show
code-link: true
---
```{r}
library (tidyverse)
library (here)
library (phyloseq)
library (seriation)
library (microViz)
```
## Intro
- Focusing on 1 sample per participant fairly arbitrarily chosen as first in visit order
- Checking for any overall compositional differences between Controls and UC and CD patients
- Warnings about lack of counts I'll fix in microViz as they're annoying when the metaphlan profiles are proportions
## Get combined data
```{r}
# also fix/delete messy variables that produce variable name warnings
ps <- read_rds (here ("data" , "phyloseq.rds" ))
ps <- ps %>%
ps_mutate (
ever_bowel_surgery = ..6 ..Have.you.ever.had.bowel.surgery.,
..6..Have.you.ever.had.bowel.surgery. = NULL ,
diarrhea_2w = ..4 ..In.the.past.2. weeks..have.you.had.diarrhea.,
hospitalised_2w = ..5 ..In.the.past.2. weeks..have.you.been.hospitalized.,
diagnosis = factor (diagnosis, levels = c ("CD" , "UC" , "nonIBD" ))
) %>%
ps_select (! contains ("In.the.past.2.weeks" ))
```
- Chuck bad samples (low reads)
```{r}
ps %>% samdat_tbl () %>%
ggplot (aes (x = reads_raw)) +
geom_histogram (bins = 100 ) +
coord_flip () +
scale_x_log10 ()
```
```{r}
ps <- ps %>% ps_filter (reads_raw > 10 ^ 5 )
```
- Compute alpha diversity
```{r}
ps <- ps %>% ps_calc_diversity (
rank = "Species" , index = "shannon" , varname = "shannon_species"
)
```
```{r}
# take only first sample per person (by visit order)
ps1 <- ps %>%
ps_arrange (visit_num) %>%
ps_dedupe (vars = "Participant.ID" , method = "first" )
```
### Write out IDs
```{r}
ps1 %>%
samdat_tbl () %>%
select (visit_num, week_num, PDO.Number, contains ("ID" , ignore.case = FALSE )) %>%
write_csv (here ("data" , "first_visit_samples.csv" ))
```
## Quick plots and analyses
### Compositions
- No obvious differences between groups but quite some variation to explore and discuss.
```{r, fig.width=10, fig.height=10}
ps1 %>%
comp_barplot(
tax_level = "Genus", facet_by = "diagnosis",
n_taxa = 13, merge_other = FALSE,
) +
coord_flip()
```
```{r, fig.width=10, fig.height=10}
cols <- distinct_palette(n = 3, add = NA)
names(cols) <- levels(samdat_tbl(ps1)$diagnosis)
ps1 %>%
ps_mutate(Diagnosis = as.character(diagnosis)) %>%
tax_agg(rank = "Species") %>%
tax_sort(by = sum, trans = "compositional") %>%
tax_transform("clr", rank = "Species") %>%
comp_heatmap(
taxa = 1:40, taxa_side = "right",
tax_anno = taxAnnotation(
Prev. = anno_tax_prev(bar_width = 0.3, size = grid::unit(1, "cm"))
),
sample_anno = sampleAnnotation(
Diagnosis = anno_sample("Diagnosis"),
col = list(Diagnosis = cols), border = FALSE
)
)
```
### CLR PCA
```{r, fig.width=8, fig.height=8}
ps1 %>%
tax_transform("clr", rank = "Species") %>%
ord_calc("PCA") %>%
ord_plot(colour = "diagnosis") +
stat_ellipse(aes(colour = diagnosis))
```
```{r}
# interactive
# ps1 %>%
# tax_transform("clr", rank = "Species") %>%
# ord_calc("PCA") %>%
# ord_explore()
```
```{r, fig.width=8, fig.height=8}
ps1 %>%
tax_transform("clr", rank = "Species") %>%
ord_calc("PCA") %>%
ord_plot(colour = "diagnosis", axes = c(1,3)) +
stat_ellipse(aes(colour = diagnosis))
```
```{r, fig.width=8, fig.height=8}
ps1 %>%
tax_transform("clr", rank = "Species") %>%
ord_calc("PCA") %>%
ord_plot(colour = "diagnosis", axes = c(1,3), plot_taxa = 1:10)
```
### PERMANOVAs
- There is a difference by Aitchison distance on Species level
::: panel-tabset
#### Aitchison Species
```{r}
ps1 %>%
tax_transform ("clr" , rank = "Species" ) %>%
dist_calc ("euclidean" ) %>%
dist_permanova ("diagnosis" ) %>%
perm_get ()
```
#### Bray Curtis Genus
```{r}
ps1 %>%
tax_transform ("identity" , rank = "Genus" ) %>%
dist_calc ("bray" ) %>%
dist_permanova ("diagnosis" ) %>%
perm_get ()
```
#### Binary Jaccard Species
```{r}
ps1 %>%
tax_transform ("binary" , rank = "Species" ) %>%
dist_calc ("jaccard" ) %>%
dist_permanova ("diagnosis" ) %>%
perm_get ()
```
```{r}
ps1 %>%
tax_transform ("binary" , rank = "Species" ) %>%
dist_calc ("jaccard" ) %>%
ord_calc () %>%
ord_plot (axes = 1 : 2 , colour = "diagnosis" ) +
stat_ellipse (aes (colour = diagnosis))
```
:::
### Alpha diversity
#### Compare groups - timepoint 1
- By eye the diversity of some but not all patient samples is lower than controls.
```{r}
ps1 %>%
samdat_tbl () %>%
ggplot (aes (x = diagnosis, y = shannon_species)) +
geom_boxplot () +
geom_jitter ()
```
- Colitis vs controls
```{r}
ps1 %>%
samdat_tbl () %>%
filter (diagnosis != "CD" ) %>%
ggstatsplot:: ggbetweenstats (
x = diagnosis, y = shannon_species
)
```
- Crohn's vs controls
```{r}
ps1 %>%
samdat_tbl () %>%
filter (diagnosis != "UC" ) %>%
ggstatsplot:: ggbetweenstats (
x = diagnosis, y = shannon_species
)
```
#### Check timepoint 1 relation to antibiotics etc?
::: panel-tabset
##### Abx
```{r}
ps1 %>%
samdat_tbl () %>%
ggstatsplot:: ggbetweenstats (x = Antibiotics, y = shannon_species)
```
##### Diarrhea
```{r}
ps1 %>%
samdat_tbl () %>%
ggstatsplot:: ggbetweenstats (x = diarrhea_2w, y = shannon_species)
```
##### Hospitalised
```{r}
ps1 %>%
samdat_tbl () %>%
ggstatsplot:: ggbetweenstats (x = hospitalised_2w, y = shannon_species)
```
##### Bowel surgery ever
```{r}
ps1 %>%
samdat_tbl () %>%
ggstatsplot:: ggbetweenstats (x = ever_bowel_surgery, y = shannon_species)
```
##### Age
```{r}
ps %>%
samdat_tbl () %>%
ggstatsplot:: ggscatterstats (
x = consent_age, y = shannon_species,
point.args = list (mapping = aes (colour = diagnosis))
)
```
:::
#### Diversity over time
- Just for my own interest now, because I haven't done an IBD project before
- Seems like a subset of IBD patients have persistently very low diversity
- Diversity is a crude measure, compositional stability would be more interesting, but I should stop now..
```{r}
ps %>%
samdat_tbl () %>%
add_count (name = "n_samples" , .by = Participant.ID) %>%
filter (n_samples > 3 ) %>%
mutate (mean_shannon = mean (shannon_species), .by = Participant.ID) %>%
ggplot (aes (x = visit_num, y = shannon_species)) +
geom_hline (yintercept = 1.5 , linetype = "dashed" ) +
geom_line (aes (group = Participant.ID, colour = mean_shannon < 1.5 ), alpha = 0.5 ) +
geom_smooth (method = "lm" , colour = "black" ) +
facet_wrap (facets = vars (diagnosis)) +
guides (colour = "none" ) +
theme_bw ()
```
```{r}
ps %>%
samdat_tbl () %>%
group_by (diagnosis, Participant.ID) %>%
summarise (
n = n (), mean_shannon = mean (shannon_species),
sd_shannon = sd (shannon_species)
) %>%
ungroup () %>%
filter (n > 5 ) %>%
ggplot (aes (colour = diagnosis, x = mean_shannon, y = sd_shannon)) +
geom_vline (xintercept = 1.5 ) +
geom_point (aes (shape = mean_shannon < 1.5 )) +
theme_bw ()
```
```{r}
ps %>%
samdat_tbl () %>%
add_count (name = "n_samples" , .by = Participant.ID) %>%
filter (n_samples > 3 ) %>%
mutate (mean_shannon = mean (shannon_species), .by = Participant.ID) %>%
ggplot (aes (
x = reorder (Participant.ID, shannon_species, FUN = mean),
y = shannon_species
)) +
geom_hline (yintercept = 1.5 ) +
geom_boxplot (aes (fill = mean_shannon < 1.5 ), outliers = FALSE ) +
geom_point (size = 1 ) +
facet_grid (cols = vars (diagnosis), scales = "free_x" , space = "fixed" ) +
theme_bw () +
theme (axis.text.x = element_blank ())
```
## Session info
```{r}
sessioninfo:: session_info ()
```